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The one-shot classical capacity of a quantum cliannel quantifies the amount of classical information 
that can be transmitted through a single use of the channel such that the error probability is 
below a certain threshold. In this work, we show that this capacity is well approximated by a 
relative-entropy-type measure defined via hypothesis testing. Combined with a quantum version of 
Stein's Lemma, our results give a conceptually simple proof of the well-known Holevo-Schumacher- 
Westmoreland Theorem for the capacity of memoryless channels. They also give general capacity 
formulas for arbitrary channels. 
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I. INTRODUCTION 

The channel coding theorem for a stationary memo- 
ryless classical-quantum channel has been established by 
Holcvo [l[ and Schumacher and Westmoreland Q. A 
formula for most general classical quantum channels has 
been obtained by Hayashi and Nagaoka Q . All of these 
results concern the asymptotic regime where the number 
of channel uses tends to infinity and the probability of 
error is required to tend to zero. 

The present work deals with a different scenario where 
the channel is used only once, and a finite error proba- 
bility is allowed. We provide upper and lower bounds on 
the amount of (classical) information that can be trans- 
mitted through one use of the quantum channel such 
that the average probability of error is below a certain 
value. The bounds are generalizations of similar results 
on one-shot capacities of classical channels [j]. Com- 
bined with the Quantum Stein's Lemma 15|, they give 
a conceptually simple proof of the Holevo-Schumacher- 
Westmoreland Theorem [l|, 0] . The bounds can also be 
directly applied to "many" uses of an arbitrary channel, 
where no assumption is made on the channel or the in- 
put states. In the asymptotic limit as the number of 
channel-uses tends to infinity and the probability of er- 
ror is required to tend to zero, the upper bound and the 
lower bound coincide and lead to a capacity expression 
which is equivalent to that in [3|. These results require 
remarkably simple proofs, despite their strength and gen- 
erality. 

Channel coding is closely related to the problem of hy- 
pothesis testing, and this connection has been used in 
several works (see, e.g., @, la-Q)- Here, we use hypothe- 
sis testing very directly to define a relative-entropy-type 
quantity (Section |ll|. Our bounds on the one-shot chan- 
nel capacity will then be expressed in terms of this quan- 
tity fSection lllip . This quantity is similar to the "smooth 



min- relative entropy" introduced in [8|, but its position 
in the smooth entropy framework (see, e.g., Q) is still to 
be clarified. 

This work is closely related to recent work of Mosonyi 
and Datta [l^l who also studied the one-shot classical 
capacities of quantum channels. However, the bounds on 
the capacity they derive are different from ours (and the 
quantitative relation between them is unknown). In par- 
ticular, the upper and lower bounds in our work coincide 
asymptotically for arbitrary channels, which is not shown 
to be true for the bounds in |10| . 



II. HYPOTHESIS TESTING AND Dh(p| cr) 

Consider a hypothesis testing problem between two 
quantum states p and a which are density operators act- 
ing on a Hilbert space. We wish to minimize the prob- 
ability of guessing p when the real state is a, subject to 
the condition that the probability of guessing a when the 
real state is p is at most e. Denote this minimum prob- 
ability by p*[p, CT, e). Since any binary hypothesis test is 
equivalent to a POVM measurement with two elements, 
it is easy to verify the following: 

p* {p, a, e) = inf tr (Qa) . 

Q:0<Q<I, 

tr(Qp)>l-e 

Motivated by this observation we define a new type of 
relative entropy: 

Definition 1 (^^(pllcr)). The hypothesis testing rela- 
tive entropy with parameter e between two quantum states 
p and a is 

DI,{p\\cT)^ sup {-logtr(Qa)}. 
Q.O<Q<I. 
tr(Qp)>l-£ 

Lemma 1 (Properties of Z?|j(-||-)). 
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1. Relation to hypothesis testing: 



Di 



\<j) = -logp*{p,a,e) 



2.Positivity: for any p, a , and e G [0, 1], 



Dl 



h) > 0, 



with equality if and only if p ~ a and e 
Z. Relation to RenyVs relative entropy: 



D'ni 



|o-) = -logtr(no-) = i:>o(/9||a) 



where 11 denotes the projector onto the support of p, 
and where Z3o(p||f) denotes Renyi's relative entropy 
of order 0. 
A. Data Processing Inequality (DPI): for any states p 
and a, any Completely Positive Map (CPM) £ acting 
on them, and any e S [0, 1], 

D^^{p\\a)>D^^{£{p)\\£{cj)). 

Proof. The first three properties are immediate from the 
definition of D\^{p\\a). We next prove the DPI. Con- 
sider any POVM measurement to distinguish £{p) from 
£{(j). We construct a new POVM to distinguish p from 
a by preceding the given POVM with the CPM £. This 
new POVM clearly gives the same error probabilities (in 
distinguishing p and a) as the given POVM (in distin- 
guishing £{p) and £{iy)). Thus we have 



p*{p,a,e)<p*{£{p),£{a),e), 



U 



which, combined with Property 1, yields the DPI. 

The Quantum Stein's Lemma Q shows how D^{p\\a) 
is related to the normal quantum relative entropy 
D{p\\a). We restate this lemma in the following way to 
highlight this relation. 

Lemma 2 (Quantum Stein's Lemma). For any two 
states p and a on a Hilbert space, and any e € (0, 1), 



lim -Dl,{p'^-\\a^-)^D{p\\a). 

n— >oo 71 



III. MAIN RESULTS 

A channel consists of a set of quantum states available 
as input, where each state is labeled by a different x E X . 
Throughout this paper, we assume the cardinality of X to 
be finite. We define a family of normalized and mutually 
orthogonal vectors |a;) £ A, parameterized by x. When 
the state labeled x is fed into the channel, the output 
state is denoted as px which is a density operator acting 
on some Hilbert space B. The channel can be described 
by a CPM W from 5(A) to 5(B), where S{-) denotes the 
set of density operators on a Hilbert space, such that 

Px^W{\x){x\). 

For any probability mass function on X given by Px ~ 
{Pxjxex we denote 



7r*»^^p.|x)(a;r®p», 



and denote the partial traces of tt by tt and tt", re- 
spectively, i.e., 

^^ = ^p,|x)(a;|*, 
xex 

B Y^ B 

71- = 2^PxPx- 

xGX 

A codebook of size m is a list of input labels {xi}, i £ 
{!,..., m}. A corresponding decoding POVM acts on B 
and has m elements. We say a decoding error occurs if 
a state Xi is fed into the channel but the output of the 
decoding POVM is not i. An {m,e)-code consists of a 
codebook of size m and a corresponding decoding POVM 
such that, when the message is chosen uniformly, the 
average probability of a decoding error is at most e.[l3[ 

We are now ready to prove one-shot converse and 
achievability bounds expressed using D^{p\\a) on the 
amount of information that can be transmitted through 
a quantum channel. We begin with the converse bound. 



Theorem 1 (Converse). If a {2^,e)-code exists, then 



i?<supi5^(7r^|K*®^»). 

Px 



(1) 



We shall give two simple proofs for Theorem [TJ The 
idea of the first proof is to construct a hypothesis test 
between tt'^" and tt* (g) tt". 



Proof 1. We choose a uniform distribution on the x's 
used in the codebook. This yields the state 



^AB^2-«^|a;,)(a;,|®p, 



i=l 



To prove (H]), it suffices to show 

i?<i?|j(7r*»||^*®^B) 



(2) 



(3) 



x^X 



for the above tt**. To this end, denote the decoding 
POVM matrices by {E'J. Let 

i=l 

On the one hand, it is obvious that 
< Q < /; 

on the other hand, we can check that, because the average 
probability of error is not larger than e, 

tr (Q^*») = 2-^ Y. t^ ^^-P-^ ) > 1 - e- 

i=i 

Thus, to prove the theorem, it suffices to show 
i?<-logtr(Q(^^®^»)). 



In fact, this inequality holds with equality, as we justify 
as follows: 



tr {QiTT^ ® ^B)) = tr I 2-« ^ E, 2~^ ^ p.. 

2« 



2-^tr / 2-«^p.. 



The main technique we need for proving Theorem J5] 
is the following lemma by Hayashi and Nagaoka [j. 
Lemma 2]: 

Lemma 3. For any positive real c and any operators 
< S < I and T > 0, we have 

i-{s + Ty^i^s{s + Ty^''^ 

< {l + c)(I-S) + (2 + c + c-^)T. 



a 

We next give the second proof which uses the DPI for 

^H(plk)- 



Proof 2. As in Proof 1, we choose a uniform distribution 
on the x's used in the codcbook and show that ^ holds 
for the state tt*™ as in ^. To this end, it is enough to 
show that 



R<DI,{Pmm'\\Pm®Pm'), 



(4) 



where Pmm' is a (classical) state denoting the joint distri- 
bution of the transmitted message M and the decoder's 
guess M' . Indeed, the decoding POVM combined with 
the inverse of the encoding map can be viewed as a CPM 
which maps tt^* to Pmm' and which maps tt* (8) tt* to 



Pm <E)Pm'. 
that 



Thus it follows from the DPI for Di 



k) 



DIi{Pmm'\\Pm ® Pap) < Z?^(^*"|k* 7r»), 



and hence @i implies ([3]). 

To prove ^, we suggest a (possibly suboptimal) 
scheme to distinguish between Pmm' and Pm®Pm' ■ The 
scheme guesses Pmm> if M = M' , and guesses Pm ® Pm' 
otherwise. In this scheme, the probability of guessing 
Pm <Xi Pm' when Pmm' is true is exactly the probability 
that M ^ M' computed from Pmm', namely, the aver- 
age probability of a decoding error, and is thus not larger 
than e by assumption. On the other hand, the probability 
of guessing Pmm' when Pm Pm' is true is 

Pm ® Pm'{M = M'} = Y. P^'^') ■ P^'' (*) 

i=l 

2« 

= 2-«5]PM'(^) 



Thus we obtain ^ and hence ([3]). 



D 



Theorem 2 ( Achievability) . For any e > e' > and 

c > there exists a {2^,e)-code with 

R > supDiin^'Wn^ ® 7r») - log ^^'^"\\ . (5) 
Px e - (1 + c)e' 



Proof of Theorem\^ Fix e' g [0,e) and c > 0. For any 
Px, we shall first show that, for any Q acting on AB 
such that < Q < / and tr (Qtt'^®) > 1 - e', there exist 
a codebook and a decoding POVM which satisfy 

Pr(error) < (1 + c)e' 

+ (2 + c + c^i)(2« - 1) tr (0(7r* ® tt")) . (6) 

To this end, for any such Q, we define 

^»^tr4(|x)(xr®/»)Q). 

We randomly generate a codebook by choosing the code- 
words identically and independently according to Px- 
Let the corresponding decoding POVM have elements 



E, 



For a specific codebook {xj] and the transmitted code- 
word Xi, the probability of error is given by 

Pr(error|xi, {xj]) = tr ((/ - E^)pxJ . 

Using Lemma [3] we obtain 

Pr(error|xi,{a;j}) < (1 + c)(l - tr (A^,p^J) 

-f (2 -f c + c-i) ^ tr (^^^.p^ J . 




JT^i 



Averaging over all codebooks we have 
Pr(crror|xj) < (1 + c)(l - ti-{A^^p^j) 



+ (2 + c + c-^)(2^<-l)tr 



^ PxAx j Px, 1 • 
.x£X ) ) 

Further averaging the above inequality over the trans- 
mitted codeword Xi we obtain 

Pr(error) < (1 + c) 1 - ^p^, tr {A^p^) J 
+ (2 + c + c-i)(2«-l) 

' *M ( YVxA,j,. \ ( ^PxPx \ 1 



(7) 



To sec that this is the desired inequality, we first check 

1 - e' < tr (Q^^«) 

= 5^p,tr(Q|x)(x|*®p») 

X 



(8) 



and then eheek 

tr(Q(7r*®7r»)) 



= ^p^,iAQ\x'){x'\® [Y^PxPx] ] 
= ^Px' tr I Aj,, I ^Pxpx I 1 
= tr 11 ^Pj^A^ \ I ^PxPx j j ■ 



(9) 



Using (0, (HI) and dH) we sec that (O holds for the aver- 
age probability of error averaged over the class of eode- 
books we generated. Thus there must exist at least one 
codebook that satisfies ^. Furthermore, since a code- 
book that satisfies ([6]) can be found for any Q satisfying 
< Q < / and tr (Qtt*") > 1 - e', we conclude that 
there must exist a codebook that satisfies 

Pr(error) < (1 + c)e' + (2 + c + c-1)2«-^h (-"II-*®-'). 

By rearranging terms in the above inequality we ob- 
tain ©. D 



IV. ASYMPTOTIC ANALYSIS 



The results of Section IIIII apply to the transmission 
of a message in a single use of a channel. Obviously, a 
channel that is used n times can always be modeled as one 
big single-use channel. We can thus employ Theorems [1] 
and [2] to derive the known asymptotic expressions for 
the capacity of channels that are used many times. The 
simplest such case is that of a memory less channel. Here 
we can directly apply Stein's lemma to recover the well- 
known Holevo-Schumacher- Westmoreland Theorem [l|, [2| 
which says that the capacity of a memoryless channel is 
given by 



C 



SUpSUpi?(^*" »" ||7 
n Pxn 

sup sup /(A®"; B®"), 

n Pxn 



®7r" 



i®n\ 



denotes the mutual information be- 



where /( 

tween A®" and 1®". 

In the following, we show how analogous asymptotic 
formulas can be obtained for a channel whose structure 
is arbitrary. Such a channel is described by CPMs from 
A®" to 1®" for ah n G Z+, where A is the space of the 



labels of the input states, and where B is the space of 
the output states. An {n,m,e)-code on a channel con- 
sists of a codebook with entries (x^.i, . . . ,Xi.„) G A"", 
i G {!,..., m}, and a decoding POVM acting on B®" 
such that average probability of error is no larger than e. 
We define capacity and optimistic capacity in the same 
way as in classical information theory [3|, |ll| . 

Definition 2 (Capacity and Optimistic Capacity). The 
capacity C of a channel is the supremum over all R for 
which there exists a sequence of {n, 2'"'^, en) -codes such 
that 

lim e„ = 0. 

The optimistic capacity C of a channel is the supre- 
mum over all R for which there exists a sequence of 
{n^^^^^ en) -codes such that 

lim e„ = 0. 

n— >oo 

Given Definition [51 the next theorem is an immediate 
consequence to Theorems [T] and [51 

Theorem 3 (General Capacity Formulas). For any 
quantum communication channel whose law is described 
by a sequence of CPU's W„ : <S(A®") -^ 5(1®"), its 
capacity is given by 



G = hm lim — sup UYiy'K \\i 

and its optimistic capacity is given by 
C = lim Ih^ isupi?fi(^**"»*"||7 

e^O n->oo n p „ 



"), 



)■ 



Theorem [31 is equivalent to 0, Theorem 1], though the 
expressions used in the two works are completely differ- 
ent. The (non-operational) proof of the equivalence of 
the expressions used in Theorem [3l and those used in [3, 
Theorem 1] is similar to the proof of @, Theorem 3] or 
the classical version [J, Lemma 2] , and is omitted in this 
paper. 

We can also use Theorems [H and [2l to study the e- 
capacities which are usually defined as follows in classical 
information theory (see, for example, |12|): 

Definition 3 (e-Capacity and Optimistic e-Capacity). 
The e-capacity Ce of a channel is the supremum over 
all R such that, for every large enough n, there exists 
an (n,2^^,e)-code. The optimistic e-capacity C^ of a 
channel is the supremum over all R for which there exist 
(n, 2"^, e)-codes for infinitely many n 's. 

Then from Theorems [Hand [51 we immediately have the 
following: 

Theorem 4 (Bounds on e-Capacities) . For any quantum 
communication channel and any e G (0, 1), the e-capacity 



of the channel satisfies 

C, > lim lim i sup i?|;(,r*«"»«" ||^A«" ^ ^1«"). 
and the optimistic e-capacity of the channel satisfies 



C, < lim -supDfj(7r'* 

n^oc n p 






Ce > lim lim — sup 1?^ (tt 

e'-f-e Ti-s-oo 77, p 



(8) TT ). 
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ered. For finite block-length analysis, one can construct 
a code that has maximum probability of error not larger 
than 2e from a code of average probability of error e by 
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